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Abstract  This  work  develops  a  global  minimiza¬ 
tion  framework  for  segmentation  of  high  dimen¬ 
sional  data  into  two  classes.  It  combines  recent 
convex  optimization  methods  from  imaging  with 
recent  graph  based  variational  models  for  data  seg¬ 
mentation.  Two  convex  splitting  algorithms  are 
proposed,  where  graph-based  PDE  techniques  are 
used  to  solve  some  of  the  subproblems.  It  is  shown 
that  global  minimizers  can  be  guaranteed  for  semi- 
supervised  segmentation  with  two  regions.  If  con¬ 
straints  on  the  volume  of  the  regions  are  incorpo¬ 
rated,  global  minimizers  cannot  be  guaranteed,  but 
can  often  be  obtained  in  practice  and  otherwise  be 
closely  approximated.  Experiments  on  benchmark 
data  sets  show  that  our  models  produce  segmen¬ 
tation  results  that  are  comparable  with  or  outper¬ 
form  the  state-of-the-art  algorithms.  In  particular, 
we  perform  a  thorough  comparison  to  recent  MBO 
and  phase  field  methods,  and  show  the  advantage 
of  the  algorithms  proposed  in  this  paper. 
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1  Introduction 

We  consider  the  problem  of  clustering  general 
high-dimensional  data  into  two  classes.  The  data 
points  are  viewed  as  nodes  on  a  graph  and  the  sim¬ 
ilarity  between  them  is  represented  by  the  weight 
function  defined  on  the  edges  between  the  nodes. 
Recently,  there  has  been  a  growing  interest  in  for¬ 
mulating  such  problems  as  variational  or  optimiza¬ 
tion  problems,  where  the  non-local  total  variation 
term  plays  a  fundamental  role  in  constructing  the 
cost  function. 

A  graphical  framework  is  often  used  to  exploit 
underlying  similarities  in  the  data  [3,12,50,54-56]. 
For  example,  spectral  graph  theory  [14,  42]  uses 
this  approach  to  perform  various  tasks  in  imaging 
and  data  clustering.  The  graph  Laplacian,  one  of 
its  fundamental  concepts,  is  described  in  section  2. 

Graph-based  formulations  have  been  used  ex¬ 
tensively  for  image  processing  applications  [6, 16, 
17,19,27,28,30,39,46].  A  typical  framework  in¬ 
volves  the  similarity  graph  where  each  two  ver¬ 
tices  are  given  a  weight  measuring  their  similar¬ 
ity.  Buades  et  al.  in  [10]  introduce  a  new  non-local 
means  algorithm  for  image  denoising  and  compare 
it  to  some  of  the  best  methods.  In  [28],  Grady  de¬ 
scribes  a  random  walk  algorithm  for  image  seg¬ 
mentation  using  the  solution  to  a  Dirichlet  prob¬ 
lem.  Elmoataz  et  al.  present  generalizations  of  the 
graph  Laplacian  [19]  for  image  denoising  and  man¬ 
ifold  smoothing.  Gouprie  et  al.  in  [16]  propose  a 
parameterized  graph-based  energy  function  that 
unifies  graph  cuts,  random  walker,  shortest  paths 
and  watershed  optimizations.  We  use  a  non-local 
calculus  formulation  [21]  to  generalize  the  contin¬ 
uous  formulation  to  a  (non-local)  discrete  setting, 
while  other  non-local  versions  for  weighted  graphs 
are  described  in  [19].  A  comprehensive  reference 
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about  casting  continuous  PDEs  in  graph  form  is 
found  in  [29]. 

Our  work  involves  semi-supervised  clustering, 
where  the  labeling  of  a  small  set  of  the  data  points 
is  provided  in  advance.  Such  problems  have  been 
studied  in  [4,40]  in  a  variational  framework,  where 
a  Ginzburg-Landau  functional  was  defined  on  the 
graph  and  minimized  by  PDE  techniques,  such  as 
phase  field  [4]  and  the  MBO  scheme  [40].  MBO 
was  originally  formulated  in  the  Euclidean  space 
as  a  numerical  scheme  for  solving  evolution  equa¬ 
tions  involving  mean  curvature  motion  of  a  re¬ 
gion  boundary  [41]  and  has  also  been  used  for  im¬ 
age  segmentation  [49] .  Since  the  energy  functional 
is  non-convex  and  the  PDEs  evolve  in  the  steep¬ 
est  descent  direction,  these  approaches  may  poten¬ 
tially  get  stuck  in  unwanted  local  minima. 

On  the  other  hand,  in  unsupervised  clustering, 
there  are  no  a  priori  knowledge  of  the  labeling 
of  some  of  the  data  points.  In  such  cases,  some 
knowledge  of  the  sizes  of  each  class  is  typically 
incorporated  in  order  to  prevent  the  trivial  solu¬ 
tion  where  all  data  points  are  assigned  to  the  same 
class.  The  normalized  cut  and  the  Cheeger  ratio 
cut  [13,  46]  are  two  popular  such  cost  functions 
that  favor  clusterings  with  classes  of  equal  size. 
Recent  work  [8,  9,  32,  47]  has  focused  on  greatly 
simplifying  the  energy  landscape  in  the  Cheeger 
ratio  cut  cost  function.  For  example,  Bresson  et 
al.  in  [8]  show  rigorous  convergence  results  for  two 
algorithms  that  solve  the  relaxed  Cheeger  cut  min¬ 
imization.  Overall,  the  simplified  problems  are  still 
non-convex;  therefore,  these  approaches  may  also 
get  stuck  in  local  minima. 

It  is  well  known  in  combinatorial  optimization 
that  certain  graph  partition  problems  can  be  for¬ 
mulated  as  min-cut  problems  [20],  which  aim  to 
find  the  minimal  separation  of  the  graph  into  two 
sets,  one  of  them  containing  a  predefined  source 
node  and  the  other  a  predefined  sink  node.  The 
max-flow  problem  is  the  equivalent  dual  problem 
and  can  be  globally  optimized  by  classical  combi¬ 
natorial  algorithms  such  as  Ford-Fulkerson  [20]  or 
the  push-relabel  method  [24].  Specialized  versions 
of  such  algorithms  have  recently  become  popular 
for  solving  certain  optimization  problems  in  image 
processing  and  computer  vision  [5,6,36]. 

There  are  some  fundamental  differences  between 
the  imaging  applications  and  more  general  clus¬ 
tering  problems  on  graphs.  In  imaging,  most  of 
the  data  is  incorporated  in  a  strong  fidelity  term, 
which  measures  how  well  each  pixel  hts  to  each 
region.  Edges  are  also  defined  between  neighbor¬ 
ing  data  points  on  a  regular  grid,  but  they  are 
mainly  used  for  smoothing  purposes.  In  more  gen¬ 
eral  clustering  problems,  the  data  is  mainly  incor¬ 


porated  on  edges  between  pairs  of  data  points,  and 
the  fidelity  term  is  zero  at  the  majority  of  the  data 
points. 

The  combinatorial  max-flow  algorithm  of  [5] 
was  developed  with  specific  imaging  problems  and 
a  regular  grid  in  mind.  We  anticipate  that  it  is 
not  easily  transferable  to  graph  clustering  while 
still  maintaining  a  high  efficiency;  one  reason  be¬ 
ing  that  the  paths  between  the  source  node  and 
sink  node  are  much  longer.  Such  combinatorial  al¬ 
gorithms  also  do  not  exploit  PDE  techniques  to 
approximate  the  large  graph,  like  the  approximate 
eigendecomposition  of  the  graph  Laplacian  as  in 
[4,40]. 

This  paper  proposes  an  efficient  global  opti¬ 
mization  framework  for  semi-supervised  clustering 
problems,  formulated  in  the  same  variational  form 
as  [4,40].  Instead  of  applying  classical  combina¬ 
torial  algorithms,  we  build  on  more  recent  work 
from  imaging,  which  formulates  two  class  partition 
problems  as  convex  variational  problems  [7,11,25] 
or  variational  min-cut /max- flow  problems  [52,53]. 
Convex  optimization  algorithms  were  used  in  [7, 
25,52,53]  to  split  the  problems  into  simpler  sub¬ 
problems,  each  of  which  could  be  solved  by  PDE 
techniques.  In  this  paper,  we  describe  the  exten¬ 
sion  of  the  variational  min- cut /max- flow  duality 
in  [52,  53]  and  of  the  algorithm  in  [25,  51]  to  a 
more  general  graph  setting  to  solve  a  more  gen¬ 
eral  clustering  problem.  Here,  the  two  global  min¬ 
imization  methods  are  referred  to  as  “max-flow” 
and  “primal  augmented  Lagrangian”  algorithms, 
respectively.  The  new  subproblems  are  solved  by 
graph-based  PDE  techniques.  We  also  show  how 
constraints  on  the  size  of  each  class  can  be  incor¬ 
porated  by  a  small  modification  of  the  max-flow 
algorithm. 

Our  global  minimization  algorithms  are  tested 
on  several  benchmark  data  sets,  and  we  compare 
them  with  the  phase  held  [4]  and  MBO  [40]  al¬ 
gorithms  as  well  as  with  each  other.  One  notable 
hnding  is  that  if  the  known  data  points  are  not 
distributed  relatively  uniformly  among  the  entire 
data  set,  the  local  minimization  methods  may  have 
difficulty  hnding  the  correct  solution. 

The  paper  is  organized  as  follows.  In  section 
2,  we  review  the  graphical  framework  and  some 
previous  related  work.  In  sections  3.1  and  3.2,  we 
present  our  two  novel  graph-based  clustering  meth¬ 
ods.  The  results  are  shown  in  section  4.  In  addition, 
in  section  5,  we  present  a  slight  modihcation  of  the 
algorithm  in  section  3.1  to  incorporate  the  size  of 
the  two  classes,  and  we  also  note  the  results  of  the 
method.  We  conclude  in  section  6. 
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2  The  Graphs  Pramework 

We  consider  the  data  as  vertices  on  a  graph. 
Let  G  be  an  undirected  graph  G  =  {V,E),  where 
V  and  E  are  the  sets  of  vertices  and  edges,  re¬ 
spectively.  Each  edge  is  equipped  with  a  weight 
denoted  by  the  weight  function  w{x,  y),  which  sat¬ 
isfies  the  symmetric  property  and  measures  the 
similarity  between  vertices  x  and  y.  A  big  value 
of  w{x,y)  indicates  that  nodes  x  and  y  are  very 
similar,  while  a  small  value  of  w{x,  y)  indicates 
that  they  are  dissimilar,  and  thus  less  likely  to  be¬ 
long  to  the  same  class.  The  challenge  is  to  use  the 
right  weight  function-  the  one  that  measures  the 
important  classihcation  attributes  of  each  pair  of 
vertices. 

One  popular  choice  for  the  weight  function  is 
the  Gaussian 

w{x,y)=e  (1) 

where  d{x,  y)  is  some  distance  measure  between 
the  two  vertices  x  and  y,  and  cr  is  a  parameter  to 
be  chosen.  For  example,  if  the  data  set  consists  of 
points  in  d{x,  y)  can  be  the  Euclidean  distance 
between  point  x  and  point  y,  since  points  farther 
away  are  less  likely  to  belong  to  the  same  cluster 
than  points  closer  together.  For  images,  d{x,  y)  can 
be  defined  as  the  weighted  2-norm  of  the  difference 
of  the  feature  vectors  of  pixels  x  and  y,  where  the 
feature  vector  of  a  node  consists  of  intensity  values 
of  pixels  in  its  neighborhood,  as  described  in  [23]. 

Another  choice  for  the  similarity  function  used 
in  this  work  is  the  Zelnik-Manor  and  Perona  weight 
function  [44]  for  sparse  matrices: 

_  d(x,y)^ 

w{x,y)  =  e  \GEfEv)  ^  (2) 

where  the  local  parameter  t{x)  =  d{x,z)^,  and  z 
is  the  closest  vertex  to  vertex  x. 

Note  that  it  is  not  necessary  to  use  a  fully  con¬ 
nected  graph  setting,  which  might  be  a  compu¬ 
tational  burden.  Specifically,  the  fully  connected 
graph  can  be  approximated  by  a  much  smaller 
graph  by  only  including  edges  in  E  between  the 
M  nearest  neighbors  for  each  node  in  V,  i.e.  only 
the  edges  with  large  weight  w.  In  other  words,  the 
weight  between  two  vertices  that  are  not  M  near¬ 
est  neighbors  of  each  other  is  set  to  zero.  In  this 
paper,  we  make  use  of  such  an  approximation;  our 
edge  set  includes  only  edges  between  vertices  that 
are  near  to  each  other. 

The  matrix  W  is  defined  as  Wxy  =  w{x,  y)  and 
the  degree  of  a  vertex  x  &  V  as, 

d{x)  =  ^  w[x,y). 

ydV 


We  let  D  to  be  the  diagonal  matrix  with  elements 
d{x)  on  the  diagonal. 

We  use  a  graphical  framework  because  it  sim¬ 
plifies  the  processing  of  high-dimensional  data  and 
provides  a  way  to  deal  with  nonlinearly  separable 
classes. 

2.1  Well  known  operators  in  graph  form 

We  define  operators  on  graphs  in  a  similar  fash¬ 
ion  as  done  in  [21,31],  where  the  justification  for 
these  choices  is  shown. 

Assume  m  is  the  number  of  vertices  in  the  graph 
and  let  V  =  and  £  =  R™*™  be  Hilbert 
spaces  defined  via  the  following  inner  products: 

{\l)v  =  X] 

X 

{YA)s  =  \  '^Y[x,y)<l){x,y)w{x,yY‘^~^ 

x,y 

for  some  r  G  [0, 1]  and  g  G  [^j  !]■  Let  us  also  define 
the  following  norms: 


l|A|lv  =  V^(^= 


UWe  =  'JW^=  J^'^Hx,yYw{x,yY<i-^; 

V 

ll'^IU.oo  =  max|(/)(a;,y)|. 

The  gradient  operator  V  :  V  ^  ^  is  then  defined 
as: 

(VA)u,(a;,  y)  =  w{x,  y)^"'^(A(y)  -  A(a;)).  (4) 

The  Dirichlet  energy  does  not  depend  on  r  or  q: 

\  ^  E -  A(y))^.  (5) 

x,V 

The  divergence  div  :  — >■  V  is  defined  as  the  ad¬ 

joint  of  the  gradient: 

(div^  (j)){x)  =  E  y)-(t^{y,  a^)), 

(6) 

where  we  define  the  adjoint  using  the  following  def¬ 
inition:  (yu,4))e  =  — {u,  divu,  <^)v- 

We  now  have  a  family  of  graph  Laplacians  A,.  = 
divuj  V  :  V  — ^  V: 

(A^A){a:)=E^^#(A(2/)-A(x)). 

y  ^  ' 


(3) 


(7) 
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Viewing  A  as  a  vector  in  R™,  we  can  write 

-A^A  =  (Di-’’ -  D-’’W)A.  (8) 

The  case  with  r  =  0  is  the  unnormalized  Laplacian 

L  =  D-W.  (9) 

However,  the  matrix  L  is  usually  scaled  to  guaran¬ 
tee  convergence  to  the  continuum  differential  oper¬ 
ator  in  the  limit  of  large  sample  size  [4] .  Although 
several  versions  exist,  two  popular  versions  are  the 
symmetric  Laplacian 

Ls  =  D-5LD  ^  =I-D-^WD  ^  (10) 

and  the  random  walk  Laplacian  (r  =  1) 

Lrw  =  D”^L  =  I-D”^W.  (11) 

The  advantage  of  the  former  formulation  is  its  sym¬ 
metric  property  which  allows  for  more  efficient  im¬ 
plementations  . 

A  family  of  anisotropic  total  variations  T14,  : 
V  — >•  K  can  now  be  dehned: 

rt4,(A)  =  max{(div^  A)v  :  </>  G  f ^  l} 
=  i^u;(x,y)^|A(:r)-A(2/)|.  (12) 

x.y 

It  remains  to  choose  the  parameters  q  and  r. 
We  choose  9  =  1  as  in  [21],  where  it  is  shown 
that  for  any  r,  TV^  is  the  T-limit  (Gamma  con¬ 
vergence)  of  a  sequence  of  graph-based  Ginzburg- 
Landau  (GL)-type  functionals: 

p 

Theorem  1  GL^  — >  GLq  os  e  — >■  0  where 
GL,(A)  =  ||VA||^  +  i^W(A(x)) 

X 

=  -A(y))^  +  ^^lV(A(x)) 

x,y  X 

(\\  \TV.ui{\)with  q=l  /or  A  s.t.  A(x)  G  {0, 1} 
GLo[A)  —  <  „  . 

I  00  otherwise 

Proof.  See  Theorem  3.1  of  [21].  □ 

Here  W  is  the  double- well  potential  having  two 
zeros  (in  our  case  0  and  1),  and  e  is  a  small  posi¬ 
tive  number.  It  is  also  shown  in  the  paper  (specif¬ 
ically  Theorem  3.6)  that  the  addition  of  a  fidelity 
term  is  compatible  with  P -convergence.  Since  one 
of  the  algorithms  we  compare  our  methods  to  deals 
with  the  Ginzburg-Landau  functional  directly,  to 
be  consistent,  we  use  the  above  definitions  with 
<7  =  I  in  our  formulations. 


Remark.  It  is  noted  in  [21]  that  although 
the  first  term  in  the  continuous  Ginzburg-Landau 
functional 

GLc(A)  =  e  j \\7X\^dx  +  ^J  W{X)dx 

is  scaled  by  e,  the  first  term  of  GL^^  contains  no 
e.  This  occurs  because  the  Dirichlet  energy  in  GLc 
is  unbounded  for  functions  X  of  bounded  variation 
and  taking  on  two  values  of  the  minima  of  the 
double-well  potential  (almost  everywhere).  However, 
the  difference  terms  of  GL,,  are  finite  even  in  the 
case  of  binary  functions,  and  no  rescaling  of  the 
first  term  is  necessary. 

We  choose  r  =  1  because  it  results  in  a  normal¬ 
ized  random  walk  Laplacian  and  the  eigenvectors 
as  well  as  the  corresponding  eigenvalues  of  the  ma¬ 
trix  can  be  efficiently  calculated.  Although  the  ran¬ 
dom  walk  Laplacian  matrix  itself  is  not  symmetric, 
spectral  graph  theory  described  in  [14]  shows  that 
the  eigenvectors  of  the  random  walk  Laplacian  can 
be  directly  computed  from  knowing  the  diagonal 
matrix  D  and  the  eigenvectors  of  the  symmetric 
graph  Laplacian  (which  is  a  symmetric  matrix)  Lg . 
In  particular,  we  have  the  following  theorem: 

Theorem  2  IfXis  an  eigenvalue  o/L^w  with  eigen¬ 
vector  u  if  and  only  if  X  is  an  eigenvalue  of  Lg  with 
eigenvector  w  =  D^it. 

Proof.  Multiplying  the  eigenvalue  equation  LrwU  = 
Xu  by  D2  from  the  left  and  then  substituting  w  = 

D  2  u,  one  obtains  LgW  =  Xw.  □ 

We  take  advantage  of  this  property  by  calcu¬ 
lating  the  eigenvalues  and  eigenvectors  of  the  sym¬ 
metric  graph  Laplacian  (since  symmetric  matrices 
allow  for  more  efficient  implementations)  and  then 
using  this  information  to  calculate  the  same  for  the 
random  walk  Laplacian. 

To  summarize,  we  use  the  above  operator  defi¬ 
nitions  with  q  =  1  and  r  =  I. 

In  this  work,  we  use  the  notation  A(a;)  to  denote 
the  value  of  A  at  node  x  €  V  that  provides  infor¬ 
mation  about  the  class  membership  of  the  node. 
Specifically,  we  use  A(a;)  =  0  to  denote  the  fact 
that  node  x  belongs  to  class  1,  and  A(a:)  =  1  to 
denote  that  it  belongs  to  class  2. 

2.2  Partition  problems  on  graphs 

In  this  work,  we  are  interested  in  solving  par¬ 
tition  problems  of  the  form 

E  w{x,y)  (13) 

(x,y)eE  ■.  xev,  yev\s 
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under  supervised  constraints 


where 


s  2  yf,  V\S  2  V’’  (14) 

and  an  optional  volume  constraint 


|5|=a|4-|, 

where  0  <  a  <  1.  C  F  is  a  set  of  nodes  that 
are  known  a  priori  to  belong  to  the  region  S  and 
C  1^  is  a  set  of  nodes  that  are  known  to  belong 
to  region  V\S.  By  defining  a  binary  function 


A(a:)  := 


(1,  x€S 
\  0,  a:  e  y\5' 


the  above  problem  can  be  expressed  as 
minE^iX)  =TV^{X)  + 


where 


(15) 


TVwiX)  =  ^'^w{x,yy\X{x)  -  A(y)| 

x.y 

as  defined  earlier  and 


S  =  {A:yh^{0,l}}  (16) 

is  the  set  of  binary  functions  indicating  the  parti¬ 
tion.  Here,  f{X{x),x)  is  a  fidelity  term  which  in¬ 
corporates  the  supervised  constraints  (14).  It  typ¬ 
ically  takes  the  form  of 

f{X{x),x)  =  T]{x)\X{x)  -  A°(x)p,  (17) 

where  A*’  is  a  binary  function  taking  value  1  in  V-^ 
and  0  in  and  rjix)  is  a  function  that  takes  on 
a  large  constant  value  rj  on  fidelity  points  U 
and  zero  elsewhere.  If  rj  is  chosen  sufficiently  large, 
it  can  be  guaranteed  that  the  solution  A  satisfies 
the  supervised  constraints. 

In  addition,  when  the  size  of  the  two  classes  is 
known,  the  volume  of  the  regions  may  be  enforced 
to  satisfy  a  constraint  of  the  form 

^A(x)  =  a|y|,  (18) 

x^V 

where  a  is  the  fraction  of  the  nodes  belonging  to 
class  2,  e.g.  a  =  |  enforces  partitions  of  equal  vol¬ 
ume.  The  goal  of  this  is  to  create  an  algorithm  that 
requires  a  much  smaller  fidelity  set  (to  produce  an 
accurate  classification)  than  otherwise,  because  we 
also  have  the  information  about  class  size. 

In  previous  work  [4],  the  problem  (15)  was  for¬ 
mulated  as  the  minimization  of  a  Ginzburg-Landau 
(GL)  functional  on  graphs  with  a  fidelity  term 


=  y)iHx)  -  Hy)f 

^,v 

as  defined  before.  Note  that  as  e  — >■  0,  in  the  limit 
of  gamma  convergence,  the  sum  of  first  two  terms 
of  the  energy  converge  to  the  total  variation  term, 
making  the  energy  exactly  the  same  as  one  in  (15). 
The  problem  is  solved  using  gradient  descent  and 
an  efficient  convex  splitting  scheme.  This  method 
will  be  referred  to  as  “binary  GL”  in  the  paper, 
and  we  compare  it  to  our  work. 

In  [40],  (19)  is  solved  numerically  by  a  variation 
of  the  MBO  scheme  [41],  a  method  to  approximate 
motion  by  mean  curvature.  To  make  everything 
consistent  with  the  notation  and  theorems  stated 
in  the  paper,  we  include  an  extra  scaling  in  our 
implementation  of  the  method  in  [40],  and  the  jus¬ 
tification  is  described  shortly.  We  note  that  this 
change  in  the  method  did  not  exacerbate  the  re¬ 
sults  as  compared  to  those  of  the  original  method; 
in  fact,  it  produced  very  little  change  in  any  simu¬ 
lation.  This  algorithm  will  be  referred  to  as  “binary 
MBO”  in  the  paper,  and  we  compare  it  to  our 
new  algorithms.  The  discretized  version  of  the  al¬ 
gorithm  is  the  following: 

Starting  with  some  initial  classification  A  G 
{0, 1},  alternate  between  the  following  two  steps 
until  the  stopping  criterion  is  satisfied: 

1.  Heat  equation  with  forcing  term: 


A’"+^  -  A" 
dt 


2A„A"+^ 


1  df{X{x),x) 
d[xY  dX 

(20) 


2.  Thresholding: 


A"+i(a;) 


1,  if  A"+5(a;)  >  0.5, 
0,  if  A"+5(a;)  <0.5. 


(21) 


Here,  after  the  second  step,  A"+^(a:)  can  take  only 
two  values  of  1  or  0;  thus,  this  method  is  appro¬ 
priate  for  binary  segmentation. 

Following  [40],  (20)  is  solved  by  a  semi-implicit 
scheme,  where  the  Laplacian  term  is  calculated 
implicitly,  and  the  terms  are  considered  as  a  lin¬ 
ear  combination  of  the  eigenvectors  of  the  random 
walk  Laplacian. 

To  show  the  general  idea  of  the  derivation,  we 
start  with  the  Ginzburg-Landau  (GL)  functional 
on  graphs  (19).  One  can  rewrite  it  using  inner 
product  notation: 


GLYX)  =  II VA]]^  +  i  ^  W{X[x))  +  f{X{x),x), 

X 

(19) 


GLYX)  =  \\VX\\l  +  i(7A-”W(A(x)),  l)v 

+  {D-^f{X{x),x),l)v,  (22) 
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where  {D~'"W {X)){x)  =  d{x)~'^W{X{x)).  The  fac¬ 
tor  d{x)~^  is  needed  to  cancel  the  factor  d{xy  in 
the  V-  inner  product. 

The  Allen-Cahn  equation  can  then  be  derived 
using  the  V-gradient  flow  associated  with  GL^.  We 
have 

—GLyX  +  t^)\t=o  =  — 2{Au,A,7)v 

+  i(iA-’-W'(A(x)),  7)v  +  {D-^^{X{x),x),j)v. 

(23) 

The  Allen-Cahn  equation  is  then 

A(x)  =  2A.A-j^»"(AW)-^|{(AW,i). 

(24) 


is  equivalent  to  the  formulation  (15).  The  proof  is 
obvious  as  g{4>{x)^x)  =  f{4>(x),x)  for  all  binary  </>. 

The  convex  relaxed  problem  is  formulated  as 
follows: 

min  E^{X)  =  T14,(A)  -k  ^  g{X{x),  x),  (28) 

x&V 

where 

S'  =  {A  :  [0,1]}-  (29) 

The  following  theorem,  which  is  a  generaliza¬ 
tion  of  theorem  1  in  [11]  from  images  to  graphs, 
shows  that  the  minimizer  of  the  convex  problem 
can  be  thresholded  to  yield  a  binary  global  mini¬ 
mizer  of  the  original  problem. 


The  above  equation  can  be  solved  using  a  time¬ 
splitting  scheme,  where  the  splitting  occurs  so  that 
the  double-well  potential  term  is  separated.  The 
first  step  is 

A(a:)  =  2A^A- ^^|^(A(a;),x)  (25) 

and  the  second  step  (with  the  double-well  poten¬ 
tial)  is  just  thresholding  in  the  e  — ?►  0  limit.  By 
alternating  between  these  two  steps,  one  can  form 
an  approximate  solution  of  the  Allen-Cahn  equa¬ 
tion  (24). 

Such  approaches  (binary  GL  and  binary  MBO 
methods)  converge  to  the  nearest  local  minimizer 
from  a  given  initialization.  In  general,  one  can¬ 
not  guarantee  that  the  desired  global  minimizer 
is  obtained.  The  subject  of  this  work  is  to  develop 
a  convex  optimization  framework  for  minimizing 
(15)  which  is  guaranteed  to  obtain  the  global  min¬ 
imizer.  Various  algorithms  are  developed  for  solv¬ 
ing  the  convex  problems. 


3  Global  optimization  for  partition 
problems  on  graphs 

The  problem  (15)  is  non-convex  because  the 
binary  side  constraints  (16)  are  non-convex.  We 
show  that  the  binary  constraints  can  be  replaced 
by  their  convex  hull  [0, 1]  to  obtain  an  exact  convex 
formulation  as  was  shown  in  [11]  for  images.  Define 
first  the  functions 

Cs{x)  =  f{G,x) ,  Ct{x)  =  f{l,x),  Va:  G  V, 
g{(l){x),  x)  =  Gt{x)(j){x)  +  Gs{x){l  -  (j){x)) ,  \fx  eV. 

(26) 

The  problem 

minA'^(A)  =  ri4,(A) -k  ^  g(A(a;),a;)  (27) 


Theorem  3  Let  A*  be  a  minimizer  of  (28).  De¬ 
note  by  X^  :  V  {0, 1}  the  binary  function 


_  /  I’  ^ 

\o,ifxyx)<i- 


(30) 


Then  for  almost  every  t  G  (0,1],  A^  is  a  global 
minimizer  of  the  non-convex  problem  (27). 


Proof  For  any  function  X  €  B'  and  for  any  x  G  V, 
if  A(x)  G  [0, 1],  then  X^(x)  dt  =  X{x).  Therefore, 
for  each  x  G  V, 


g(Xyx),x)d£  = 

[  Xyx)Gt{x)  +  {1  -  Xyx))Gs{x)  de 
Jo 


=  X{x)Gt{x)  -k  (1  -  X{x))Gs{x)  =  g(A(x),x). 

(31) 


By  the  coarea  formula,  we  have  that 

TV^iXy  di  =  TV^iX) . 

For  a  proof  of  the  coarea  formula  on  graphs,  see 
Appendix  B  of  [22]. 

Combining  the  above  properties,  we  obtain  that 
for  any  A  G  S', 


[  EP{Xyd£=J2  [  {TV^{X^)+g{X\x),x))d£  = 
do 

Y,  TV^W  +  g(A(x), x)  =  E^iX).  (32) 
xev 

For  a  A  that  minimizes  the  energy,  clearly  E^{X)  < 
E^{Xy  for  any  £  G  (0,1].  However,  equality  (32) 
can  then  only  be  true  provided  E^{X)  =  S'^(A^) 
for  almost  every  £  G  (0, 1].  In  other  words,  A^  also 
minimizes  the  energy  for  almost  every  £  G  (0,1]. 

□ 
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In  order  to  solve  (28),  we  consider  two  main  al¬ 
gorithms.  The  first  is  based  on  solving  a  dual  for¬ 
mulation  of  the  problem,  which  can  be  identified  as 
a  maximum-flow  problem,  by  convex  optimization 
techniques.  It  will  be  referred  to  as  the  “max- flow” 
method  in  this  paper.  The  second  algorithm  solves 
the  primal  problem  by  the  augmented  Lagrangian 
technique,  and  will  be  denoted  the  “primal  aug¬ 
mented  Lagrangian”  method  in  this  paper. 

3.1  Duality-based  max- flow  formulation 

In  graph  theory,  the  max-flow  problem  [20]  aims 
to  maximize  the  flow  from  a  source  node  s  to  a 
sink  node  t,  which  are  both  connected  by  edges 
to  the  nodes  in  V.  We  let  ps,pt  :  D  >->■  M  denote 
the  flow  variables  on  sink  and  source  edges  and 
p  :  E  1-^  M.  represent  the  flow  on  edges  between 
pairwise  points  in  V ,  where  E  (Z  V  x  V .  The  up¬ 
per  capacities  on  the  source  edges  are  denoted  by 
Cs  and  on  sink  edges  by  Ct,  and  there  is  no  lower 
bound  on  the  capacities.  The  flows  p{x,  y)  on  the 
edges  (x^y)  are  bounded  by  \p{x,y)\  <  w{x,y). 
The  amount  of  flow  in  the  graph  can  be  expressed 
as  the  amount  of  flow  on  the  source  edges,  which 
we  want  to  maximize  under  flow  capacity  and  flow 
conservation  constraints.  In  this  section,  we  de¬ 
scribe  two  max-flow  problems.  The  first  is  dual 
to  the  problem  (15)  with  fidelity  term,  and  conse¬ 
quently  solves  the  original  problem  (13)  provided 
the  penality  parameter  rj  is  high  enough.  The  sec¬ 
ond  max-flow  problem  incorporates  the  supervised 
constraints  directly  without  the  need  for  a  very 
large  penalty  term.  The  following  derivations  ex¬ 
tend  the  continuous  max- flow  problem  [52,53]  from 
images  to  general  graphs. 

Max-flow  formulation  with  supervised  constraints 
as  fidelity  term 

The  following  problem  can  be  interpreted  as  a 
max-flow  problem  over  the  graph  and  is  shown  to 
be  dual  to  the  convex  partition  problem  (28). 

max  \p{ps,pt,p)  =  ^  Psix)\  (33) 

Ps,Pt,P  I  J 

x^V 

subject  to 

\p{x,y)\  <  w{x,y),  {x,y)€E;  (34) 

Ps{x)  <  Cs{x),  VxeD;  (35) 

Pt{x)  <  Ct{x),  VxeD;  (36) 

diviu  p{x)  -  Psix) pt{x)  =  0,  MxZV.  (37) 


where  (34)  is  the  flow  capacity  constraint  on  edges 
between  paiwise  nodes,  (35)  and  (36)  are  flow  ca¬ 
pacities  on  the  source  and  sink  edges,  and  (37) 
is  the  flow  conservation  constraint.  The  objective 
function  (33)  measures  the  total  amount  of  flow  on 
the  graph.  Due  to  constraint  (35),  the  maximiza¬ 
tion  problem  (33)  is  bounded  above  by  f^Cs(x), 
which  is  finite  provided  fifix),  x)  is  bounded  (true 
for  the  data  terms  considered  in  this  work). 

It  is  well  known  that  the  maximum  flow  prob¬ 
lem  is  equivalent  to  the  min-cut  problem,  where 
the  goal  is  to  find  a  partition  that  minimizes  the 
sum  of  the  weights  between  vertices  of  the  two  re¬ 
gions.  In  classical  max-flow  min-cut  theory,  to  ob¬ 
tain  the  final  classification  by  solving  the  maximum- 
flow  problem,  one  can  use  the  information  of  the 
flow  on  the  source  and  sink  edges.  If  for  x  €  V, 
there  is  a  non-saturated  path  between  s  and  x, 
then  X  is  in  class  1.  If  there  is  a  non-saturated 
path  between  x  and  t,  then  x  is  in  class  2. 

In  this  paper,  we  instead  solve  the  max-flow 
problem  (33)  by  continuous  optimization.  The  so¬ 
lution  to  the  min-cut  problem  can  be  obtained  di¬ 
rectly  from  the  Lagrange  multiplier  for  the  flow 
conservation  constraint  (37).  Introducing  such  a 
Lagrange  multiplier  for  the  flow  conservation  con¬ 
straint  (37)  leads  to  the  the  primal-dual  formula¬ 
tion  of  (33); 

min  max  \  Eips,pt,p;  X)  = 

A  Ps  ,Pt  ,P  t 


J2psix)  +  Y,  A(a;)  ( divu,  p  -  Ps  +  Pt) }  (3*) 


x^V  x^V 

subject  to 

\pix,y)\  <  wix,y), 

ix,y)  G  E-, 

(39) 

Psix)  <  Csix), 

Vx  G  V; 

(40) 

Ptix)  <  Ctix), 

Vx  €  V; 

(41) 

Rearranging  the  terms,  we  obtain 
min  max  \  Eips,pt,p;  X)  = 

A  Pa,Pt,P  t 

{  (1  -  X)ps  Xpt-\-  X  div^,  p}  I  (42) 

xev 

subject  to  (39),  (40)  and  (41). 

By  maximizing  the  above  problem  for  Ps,Pt 
and  p,  we  obtain  the  closed  form  expression  (28) 
subject  to  the  constraint  (29).  If  the  constraint 
(29)  was  not  met,  the  energy  could  become  ar¬ 
bitrarily  large  by  chosing  ps  or  pt  arbitrarily  low, 
contradicting  boundedness  from  above. 
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Max-flow  formulation  with  hard  supervised  constraints  where  ||s||2  =  augmented  La- 


We  also  describe  another  formulation  of  the 
problem,  which  avoids  using  a  fidelity  term  that  is 
forced  to  take  on  a  very  large  value  of  rj  to  enforce 
that  A  satisfies  the  supervised  constraints.  Define 
first  the  binary  functions 

1,  xeVf  f,,  ,  (0,  xGV^ 

0,  otherwise  ’  ^  ^  [1,  otherwise 

and  consider  the  following  modification  of  the  max- 
flow  problem  (88): 


grangian  method  can  be  applied  by  alternatively 
maximizing  Lc  for  the  dual  variables  Ps,Pt  and  p 
with  constraints  (39)-(41)  and  updating  the  La¬ 
grange  multiplier  A  as  follows: 


Algorithm  1  Max-flow  Algorithm 

Initialize  pi,  pi,  p^  and  A^.  For  k  =  1, ...  until 
convergence: 

•  Optimize  p  flow 


max  {v^Vs  —  v^Pt) 

Ps,Pt,P 

x£V 

(43) 

subject  to 

|p(a:,y)|  <  w{x,y). 

ix,y)€E-,  (44) 

Ps{x)  <  0, 

Vx  e  V;  (45) 

Pt{x)  <  0, 

Vx  €  V;  (46) 

divu,p(x)  -Ps{x)  -hpt{x)  =  0, 

VxeV.  (47) 

Introducing  the  Lagrange  multiplier  A  for  constraint 
(37),  we  obtain  the  following  Lagrangian  formula¬ 
tion  after  rearrangement  of  the  terms: 

min  max  <  E(ps,pt,p;  A)  = 

A  Pe,Pt,P  t 

^  { (u''  -  X)ps  -b  (A  -  v^)pt  -b  A  divu,  p}  I 

xev 

(48) 

As  there  are  no  lower  bounds  on  ps  and  pt,  it  can 
be  observed  that  optimal  solutions  A  must  satisfy 
the  constraints 


=  arg  max  - ||div^p  -  , 

\p{e)\<W{e)  \/e£E  ^ 

(52) 

where  =  ps^  —  pfl  -b  ^  is  fixed. 

•  Optimize  source  flow  ps 


pI^^  =  arg  max  ^  ps 

Ps(x)<Gs(x)  VxeV 


c 

2 


IPs  - 


Qk 


2 

2  ’ 


(53) 


where  G'^  =  Pt^  -\-  divu,p^+^  —  ^  is  fixed. 
•  Optimize  sink  flow  pt 


^_Li  C  1 1  u  1 1 2 

Pt  =  arg  max  --\\pt-H\\ 

Pt(x)<Ct{x)  Vxev  ^ 

(54) 

where  —  div^jp^''"^  -b  ^  is  fixed. 

•  Update  A 

Afc+i  =  yfe  _  c  (div„  -  Ps +  pt"^ )  • 


<  X  <  v^,  (49) 

otherwise  the  energy  could  be  made  arbitrarily 
large.  By  maximizing  for  the  flows  Ps,Pt  and  p, 
we  therefore  obtain  the  primal  problem 

minri4(A)  (50) 

\^B' 

subject  to  (49).  If  A* *  is  a  solution  to  (50),  then  A* 
is  obviously  also  a  solution  to  (28)  provided  the 
penalty  parameter  p  in  the  fidelity  term  of  (28)  is 
chosen  sufficiently  high. 

Algorithms 

The  dual  problems  (33)  or  (43)  are  solved  by 
the  augmented  Lagrangian  method  as  in  [52,53]. 
To  solve  (33),  construct  first  the  augmented  La¬ 
grangian  function  corresponding  to  (33): 

Lc{Ps,Pt,P,X)  =  ^  {ps  -b  A(div™p-ps  -bpt)} 

xev 

Q  2 

-  -j^\\<^kV,j,p-Ps+Pt\\2  ,  (51) 


To  solve  (43),  construct  the  augmented  Lagrangian 
function: 

Lc(Ps,Pt,P,  A)  =  ^  [v^Ps-v^Pt  +  X{<lwwP-Ps+Pt)] 
xev 

Q  2 

-  2  l|div,/,p -Ps +Pt|l2  ■  (55) 

The  augmented  Lagrangian  method  for  (43)  be¬ 
comes: 


Algorithm  Is  Supervised  Max-flow  Algorithm 

Initialize  pi,  pi,  p^  and  A^.  For  k  =  1, ...  until 
convergence: 

•  Optimize  p  flow 

=  arg  max  - ||div^p  -  , 

\p{e)\<W{e)  \/e£E  ^ 

where  =  ps^  —  pt^  -b  ^  is  fixed. 


(56) 
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•  Optimize  source  flow  ps 

=  arg  max  V  v^Ps  -^\\Ps-G^ 

Ps(x)<Ce{x)yxev  ^ 

where  =  pt^  +  divu,p'‘+^  —  ^  is  fixed. 

•  Optimize  sink  flow  pt 

=  arg  max  v^pt  -^\\pt- 

Pt{x)<Ct(x)  VxeV  ^ 

where  —  divu,p''+^  +  ^  is  fixed. 

•  Update  A 

^fc+i  ^  -  c(div„p'=+i  +Pt^^)  ■ 


Due  to  the  relation  between  problem  (48)  and 
problem  (28),  the  output  A  at  convergence  will  be  a 
solution  to  (28).  Similarly,  if  rj  is  chosen  sufficiently 
high  in  (28),  then  solution  A  to  (42)  will  also  be  a 
solution  to  (28). 

By  Theorem  1,  one  can  obtain  a  partition  which 
solves  (27)  by  the  thresholding  procedure  described 
in  (30). 

The  subproblems  (52)  and  (56)  for  updating  p 
are  solved  by  one  step  of  the  following  procedure: 

=  nw  {p'^  +  cV™(div^p'=  -  F'^))  .  (59) 

Above,  Uw  is  a  projection  operator  which  is  de¬ 
fined  as 

nw{p{x,y))  = 

P{x,y)  ii\p{x,y)\  <1, 

sgn{p{x,y))W{x,y)  if  \p{x,y)\  >  1, 

where  sgn  is  the  sign  function. 

The  subproblems  (53)  and  (54)  can  be  solved 


by 

Ps[x)  =  min(G''(x)  -k  -,Gs{x)); 

c 

(61) 

Pt{x)  =  mm{H^{x),  Gtix)). 

(62) 

The  subproblems  (57)  and  (58)  can 

be  solved  by 

Psix)  =  min(G'"(x)  -k  — ,Gs(x)); 

c 

(63) 

f 

Pt{x)  =  min{H'"{x)  -  — ,Gt(x)). 

(64) 

Note  that  the  gradient  and  divergence  operators  in 
the  algorithm  are  constructed  using  the  graphical 
framework,  as  shown  by  equations  (4)  and  (6). 


3.2  Extension  of  primal  augmented  Lagrangian 
method  to  graphs 

2 

2  ’ 

In  this  section,  we  describe  another  algorithm 

(57)  for  solving  the  convex  problem  (27),  by  extend¬ 
ing  the  Split-Bregman  algorithm  [26]  for  geomet¬ 
ric  problems  [25]  to  general  graphs.  It  has  recently 
been  shown  [45,  51]  that  the  Split-Bregman  algo¬ 
rithm  is  equivalent  to  solving  a  specialized  decom- 

^  position  of  total  variation  regularized  problems  by 
2  ’ 

the  augmented  Lagrangian  method.  We  use  the 

(58)  augmented  Lagrangian  notation  when  describing 
the  algorithm,  since  this  notation  has  already  been 
introduced  in  Section  3.1  when  deriving  the  max- 
flow  algorithms. 

Consider  the  general  minimization  problem 
min  TVn,{X)  +  ^  g{X{x),x).  (65) 

^  XGV 

The  ROE  model  is  a  special  case  of  (65)  in  the  case 
that  U  is  a  regular  image  domain,  A°  is  the  noisy 
input  image  and  g{X{x),x)  =  ]A(a:)  —  A°(a;)]^. 

In  our  case,  we  wish  to  choose  g  according  to 
(26)  and  impose  the  constraint  A  £  B'.  The  idea  is 
to  solve 

minjlqjli  -k  V  g{X{x),x) 

X.Q  '  ^ 

xGV 

s.t.  q  =  VujA,  (66) 

where  ||s||]^  =  t>y  the  augmented  La¬ 

grangian  method. 

We  introduce  a  Lagrange  multiplier  (j)  for  the 
constraint  (66).  This  results  in  the  augmented  La¬ 
grangian  functional 

Lc{X,q,(j))  =  ||g||i  +  ^  {g(A(a;),a:)-k((i-(g-V^A)} 

xGV 

+  |ll9-V.A||^  (67) 

where  c  is  a  constant  and  ||s||2  = 

We  want  to  hnd  a  saddle  point  of  (67)  over  A,  q 
and  (f): 

maxmin  Lc(A,  9,  (/>)  (68) 

(p  \,q 

by  alternating  between  minimizing  for  A  and  q 
(A'",  g'')  =  arg  min  Lc(A,g,  (()'')  (69) 

\,q 

and  updating  the  Lagrange  multiplier  by  one  step 
of  gradient  ascent: 

^k+i  =  +  c(g'=+i  -  VA'^+i).  (70) 

The  minimization  problem  (69)  can  be  sepa¬ 
rated  into  two  subproblems: 

X!  {^(^(a;), x)  -  ■  V^A}  -k  ^  ||g  -  V^Ajj^  ; 

A  ^ 

x&V 
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(71) 


min  Iklli  +  X!  9  +  2  ■  (''2) 

^  xGV 


Therefore,  the  algorithm  is  the  following: 


Algorithm  2  Augmented  Lagrangian  Algorithm 

Initialize  (fd ,  and  A^.  For  fc  =  1, . 

..  until  con- 

vergence: 

•  Optimize  A 

=mm^  {g{X{x),x)-4d -V y,X]  +  ^ 

Ilg'^-V^A 

(73) 

•  Optimize  q 

=minllg|l,  +  ^  +  l  ||g- 

q  ^ ^  Z  " 

V„A'=+i||2. 

(74) 

•  Update  Lagrange  multipliers 

^fe+l  =  _  yxk+ly 

(75) 

Again,  as  in  the  max-flow  algorithm,  the  final 
binary  classification  is  obtained  by  thresholding  A 
to  either  0  or  1. 

The  subproblem  (73)  gives  the  Euler-Lagrange 
equation: 

dg 


dX 


+  cdiv^iq^ -Vy^X)  +  =  0,  (76) 


where  in  this  case  ^  =  Ct  —  Cs- 

We  solve  the  above  subproblem  using  one  step 
of  forward  Euler: 


A'^+i  -  X’^ 
dt 


=  -{Ct  -Cs+  cdiv„(g'=  -  V^A'=) 

-k  div^ ((()'')).  (77) 


This  becomes 
Afc+i  _  yfc 


dt 


=  -{Ct  -Cs+  cdiv„(g'=)  -  cA^A'^) 

-k  div„(()i'‘)).  (78) 


All  the  operators,  as  was  stated  before,  are  formu¬ 
lated  in  a  graph  setting. 

We  solve  subproblem  (74)  in  the  same  way  as 
it  is  done  in  [51]: 


7(1  -  2/)l  > 

0,  if  |&(a:,t/)|  <  1, 


(79) 

where  b  =  jcVA  — 

We  have  tried  solving  the  subproblem  (73)  above 
in  another  way.  A  similar  scheme  is  used,  except 
the  Laplace  operator  is  calculated  implicitly,  and 
we  proceed  further  by  considering  its  terms  as  a 
linear  combination  of  the  eigenvectors  of  the  ran¬ 
dom  walk  Laplacian,  in  a  similar  way  as  in  [4] 
and  [40].  Only  a  small  fraction  of  the  eigenvec¬ 
tors  are  used.  This  way  of  solving  the  subproblem 
turns  out  to  be  several  times  faster  than  the  one 
previously  discussed,  but  because  it  does  not  per¬ 
form  well  in  the  case  of  non-random  fidelity,  we 
did  not  use  it.  A  disadvantage  of  this  method  is 
that  it  only  uses  a  small  fraction  of  the  eigenvec¬ 
tors,  which  might  not  contain  enough  information 
to  result  in  an  accurate  classification,  as  was  the 
case  with  experiments  with  non-random  fidelity. 
However,  it  also  has  some  advantages,  which  are 
discussed  in  the  next  section. 


3.3  Avoiding  trivial  global  minimizers 

If  the  number  of  supervised  points  U  are 
very  low,  the  global  minimizer  of  (13)  may  just 
be  the  trivial  solution  S  =  or  S  =  V^.  This 
was  the  case  for  a  small  number  of  our  experi¬ 
ments.  In  order  to  avoid  this  problem,  the  cost 
of  these  trivial  solutions  can  be  increased  by  in¬ 
creasing  the  number  of  edges  incident  to  the  su¬ 
pervised  points  U  Non-supervised  points 
in  the  graph  are  connected  by  edges  to  their  M 
nearest  neighbors.  Supervised  points  can  instead 
be  connected  to  their  K  nearest  neighbors,  where 
K  >  M,  thereby  increasing  the  cost  of  the  parti¬ 
tions  S  =  Vf  and  S  =  V\ 

An  interesting  observation  is  that  if  the  sub¬ 
problem  (73)  in  the  primal  max-flow  algorithm 
is  solved  via  the  approximate  eigendecomposition, 
the  algorithm  does  not  result  in  the  trivial  solu¬ 
tion  S  =  and  S  =  V^,  even  when  the  number 
of  supervised  points  are  very  low.  The  reason  for 
this  seems  to  be  that  the  approximation  resulting 
from  not  using  all  the  eigenfunctions  erases  the  un¬ 
wanted  trivial  global  minimizers  from  the  energy 
landscape. 


4  Results 

The  results  for  several  data  sets  are  summa¬ 
rized  in  Table  I.  In  all  experiments,  we  have  con¬ 
structed  the  graph  using  the  M  nearest  neighbors 
and  approximated  the  eigendecomposition  of  the 
Laplacian  using  only  the  M  largest  eigenvalues. 
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The  technique  described  in  Section  3.3  for  avoid¬ 
ing  trivial  global  minimizers,  was  used  on  the  two 
moons  data  set  2  in  case  of  less  than  3.25  %  super¬ 
vised  points.  Below,  we  provide  details  about  each 
of  the  data  sets  that  we  used.  The  results  were 
computed  on  a  2.4  GHz  Intel  Core  i2  Quad. 

In  the  case  when  we  need  to  compute  the  eigen¬ 
vectors  and  eigenvalues  of  the  random  walk  graph 
Laplacian,  we  first  use  a  fast  numerical  solver  called 
the  Rayleigh- Chebyshev  procedure  [I]  to  compute 
those  of  the  symmetric  graph  Laplacian  and  then 
use  Theorem  2.  The  procedure  itself  is  a  modifica¬ 
tion  of  an  inverse  subspace  iteration  method  using 
adaptively  determined  Chebyshev  polynomials.  It 
is  also  a  robust  method  that  converges  rapidly  and 
that  can  handle  cases  when  there  are  eigenvalues  of 
multiplicity  greater  than  one.  The  calculations  are 
made  even  more  efficient  by  the  fact  that  only  a 
small  portion  of  the  eigenvectors  need  to  be  calcu¬ 
lated,  as  the  most  significant  nodes  contain  enough 
information  to  produce  accurate  results.  To  have  a 
fair  comparison,  we  use  the  same  number  of  eigen¬ 
vectors  per  data  set  for  all  methods. 

We  have  used 

/(A(a;),a;)  =  ?7(a;)|A(a;)  -  Ao(x)|^  (80) 

for  all  our  computations.  Here,  Aq  is  the  initial 
value  of  A,  and  ri{x)  is  a  function  that  takes  on  a 
value  of  a  constant  r]  on  fidelity  points  and  zero 
elsewhere. 

Below,  we  provide  more  detail  on  the  results 
for  each  of  the  benchmark  data  sets,  as  well  as 
a  description  of  the  data  set  itself.  In  addition, 
we  provide  a  comparison  of  the  results  to  those 
of  some  of  the  best  methods,  including  the  binary 
MBO  and  CL  algorithms. 

4.1  MNIST 


&  / 


Fig.  1  Examples  of  digits  from  the  MNIST  data  base 

The  MNIST  digits  data  set  [38] ,  available  at 
http://yann.lecun.com/exdb/mnist/,  is  a  data  set 
of  70000  28  X  28  images  of  handwritten  digits  from 


0  —  9.  However,  since  our  method  is  only  binary,  we 
obtained  a  subset  of  this  set  to  classify,  in  particu¬ 
lar,  digits  4  and  9  (since  these  digits  are  sometimes 
hard  to  distinguish,  if  handwritten).  This  created 
a  set  of  13782  digits,  each  either  4  or  9.  Starting 
from  some  initial  classification  of  the  points  and 
using  only  a  small  fraction  of  the  set  as  fidelity, 
the  goal  is  to  classify  each  image  into  being  either 
a  4  or  9. 

Using  random  initialization  and  random  fidelity, 
the  max-flow  method  obtained  an  accuracy  of  around 
98.48%  averaged  over  100  runs  with  different  fi¬ 
delity  sets  of  500  randomly  chosen  points  (or  only 
3.62%  of  the  set).  The  primal  augmented  Lagrangian 
method’s  accuracy  was  around  98.44%.  The  accu¬ 
racy  of  binary  MBO  graph  method  from  [40]  and 
the  binary  CL  graph  method  from  [4]  was  slightly 
lower  than  that  of  our  methods;  the  algorithms 
were  able  to  achieve  an  average  accuracy  of  around 
98.37%  and  98.29%,  respectively.  Table  1  summa¬ 
rizes  the  above  and  also  shows  results  in  the  case 
when  the  initialization  is  constructed  using  the 
thresholded  second  eigenvector  of  the  Laplacian  or 
when  the  fidelity  region  is  chosen  nonrandomly  by 
only  considering  points  that  give  values  in  the  cor¬ 
ners  of  leftmost  image  in  Figure  4(a).  More  detail 
on  the  latter  information  will  be  given  in  Section 
4.5.  The  parameters  for  Algorithm  1  were:  c  =  0.5, 
rj  =  50.  For  Algorithm  2,  they  were:  c  =  0.008, 
ri  =  400,  dt  =  0.032. 

To  compare  with  other  methods,  we  note  a  re¬ 
cent  result  by  Hu  et  al.  [33],  which  is  an  unsu¬ 
pervised  method.  The  authors  of  the  paper  also 
tested  only  digits  4  and  9  and  obtained  a  purity 
score  (measures  the  fraction  of  the  nodes  that  have 
been  assigned  to  the  correct  community)  of  0.977. 
The  GenLouvain  algorithm  obtained  a  purity  score 
of  0.975.  In  addition,  many  other  algorithms  have 
used  the  full  MNIST  data  set  with  all  10  digits. 
For  example,  Cheeger  cuts  [47],  boosted  stumps 
[34,  38] ,  and  transductive  classification  [48]  have 
obtained  accuracies  of  88.2%,  92.3%-98.74%,  and 
92.6%,  respectively.  Also,  papers  on  fc-nearest  neigh¬ 
bors  [37,38],  neural  or  convolutional  nets  [15,37, 
38],  nonlinear  classifiers  [37,38]  and  SVM  [18,37] 
report  accuracies  of  95.0%-97.17%,  95.3%-99.65%, 
96.4%-96.7%,  and  98.6%-99.32%,  respectively.  The 
aforementioned  approaches  are  supervised  learn¬ 
ing  methods  using  60,  000  out  of  70, 000  digits  (or 
about  85.71%  of  the  whole  data  set)  as  a  train¬ 
ing  set.  Our  algorithms,  taking  only  3.6%  of  the 
data  set  as  fidelity,  are  competitive  with,  and  in 
most  cases  outperform,  these  methods.  Moreover, 
we  have  not  performed  any  preprocessing  or  initial 
feature  extraction  on  the  data  set,  unlike  most  of 
the  mentioned  algorithms. 
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4.2  Banknote  Authentication  Data  Set 

The  banknote  authentication  data  set,  from  the 
UCI  machine  learning  repository  [2],  is  a  data  set 
of  1372  features  extracted  from  images  (400  x  400 
pixels)  of  genuine  and  forged  banknotes.  Wavelet 
transform  was  used  to  extract  the  features  from 
the  images.  The  goal  is  to  segment  the  banknotes 
into  being  either  genuine  or  forged. 

The  results  are  shown  in  Table  1.  With  the 
max-flow  method,  for  a  5.1%  fidelity  set,  we  were 
able  to  obtain  an  average  accuracy  (over  100  dif¬ 
ferent  fidelity  sets)  of  around  99.09%,  while  the 
primal  augmented  Lagrangian  method  achieved  a 
similar  accuracy  of  98.75%.  The  results  did  not 
deteriorate  much  for  a  smaller  fidelity  set  of  3.6%, 
with  the  two  methods  achieving  an  accuracy  of 
98.83%  and  98.29%,  respectively.  The  parameters 
for  Algorithm  1  were:  c  =  0.15, 77  =  250.  For  Algo¬ 
rithm  2,  they  were:  c  =  0.08,  rj  =  50,  dt  =  0.5. 

We  compare  this  result  to  the  binary  MBO  al¬ 
gorithm,  which  achieved  a  lower  accuracy  of  95.43% 
and  93.48%  for  5.1%  and  3.6%  fidelity  sets,  respec¬ 
tively.  For  the  binary  GL  method,  the  results  were 
also  not  as  good-  97.76%  and  96.10%,  for  5.1%  and 
3.6%  fidelity  sets,  respectively. 

4.3  Two  Moons 


Fig.  2  Two  moons  example  with  max-flow  method 

This  data  set  is  constructed  from  two  half 
circles  in  with  a  radius  of  one.  The  centers  of  the 
two  half  circles  are  at  (0,0)  and  (1,0.5).  A  thou¬ 
sand  uniformly  chosen  points  are  sampled  from 
each  circle,  embedded  in  and  i.d.d.  Gaussian 
noise  with  standard  deviation  0.02  is  added  to  each 
coordinate.  Therefore,  the  set  consists  of  two  thou¬ 
sand  points.  Starting  from  some  initial  classifica¬ 
tion  of  the  points,  the  goal  is  to  segment  the  two 
half  circles. 

For  the  max-flow  method,  in  the  case  of  65  or 
lower  number  of  hdelity  points  (3.25  %),  we  in¬ 
creased  the  number  of  edges  of  supervised  points  to 
others  to  avoid  the  trivial  global  minimizer  where 


all  points  but  the  supervised  ones  are  classified  as 
one  class. 

Using  random  initialization  and  random  fidelity, 
for  the  max-flow  method,  we  obtained  an  aver¬ 
age  accuracy  (over  100  different  fidelity  sets)  of 
97.10%  and  97.05%  in  the  case  of  100  and  50  fi¬ 
delity  points,  respectively.  An  example  of  a  solu¬ 
tion  is  shown  in  Figure  2,  with  the  two  classes  col¬ 
ored  in  red  and  blue.  The  primal  augmented  La¬ 
grangian  method  achieved  an  accuracy  of  around 
97.07%  for  100  fidelity  points  and  around  96.78% 
for  50  fidelity  points.  The  parameters  for  Algo¬ 
rithm  1  were:  c  =  0.5,  ry  =  50.  For  Algorithm  2, 
they  were:  c  =  0.32,  77  =  100,  dt  =  0.008. 

To  compare  this  with  binary  MBO,  the  method 
obtained  98.41%  and  97.53%  accuracy  for  100  and 
50  hdelity  points,  respectively,  which  is  very  simi¬ 
lar  to  the  results  of  the  binary  GL  graph  method. 

4.4  Rods 

We  have  also  tested  this  algorithm  on  two  other 
synthetic  data  sets  created  using  the  rods  pictured 
in  Figure  3.  Around  two  thousand  uniformly  cho¬ 
sen  points  were  sampled  from  each  image,  and  then 
embedded  in  R^™.  Finally,  noise  was  added  to  each 
of  the  points,  much  like  the  case  with  the  two 
moons  data  set. 

In  the  case  of  random  hdelity  region,  we  ob¬ 
tained  accuracy  in  the  98*^  or  99*^  percentile,  no 
matter  what  initialization.  In  the  case  of  hdelity  re¬ 
gion  in  the  corner,  we  obtained  interesting  global 
minimizers  for  the  two  data  sets.  The  compari¬ 
son  of  our  results  with  the  binary  MBO  and  GL 
method  is  detailed  in  the  next  section.  The  pa¬ 
rameters  for  Algorithm  1  were:  c  =  0.01,  77  =  50. 
For  Algorithm  2,  they  were:  c  =  0.016,  77  =  500, 
dt  =  0.512. 

4.5  Comparison  of  our  convex  algorithms  to 
binary  MBO  and  GL  methods 

After  comparing  the  results  of  our  convex  algo¬ 
rithms  to  the  binary  MBO  [40]  and  binary  GL  [4] 
graph  methods,  we  have  reached  the  following  con¬ 
clusions  based  on  our  work: 

•  As  long  as  the  hdelity  points  are  well  repre¬ 
sented  for  each  class  (meaning  the  hdelity  points 
represent  a  whole  variety  of  points  in  the  class), 
the  binary  MBO  method  and  the  binary  GL 
method  have  no  trouble  hnding  the  correct  min¬ 
imizer  or  something  very  close.  The  initializa¬ 
tion  might  not  matter;  even  with  a  bad  initial¬ 
ization,  the  local  minimizer  will  still  be  found. 
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Table  1  Comparison  of  methods 


max-flow 

primal  augmented 
Lagrangian 

binary 

MBO 

binary 

GL 

MNIST  (3.6%  fidelity)  random  initialization,  random  fidelity 

98.48% 

98.44% 

98.37% 

98.29% 

MNIST  (3.6%  fidelity)  2nd  eigenvector  initialization,  random  fidelity 

98.48% 

98.43% 

98.36% 

98.25% 

MNIST  (3.6%  fidelity)  random  initialization,  corner  fidelity 

98.47% 

98.40% 

62.35% 

64.39% 

MNIST  (3.6%  fidelity)  2nd  eigenvector  initialization,  corner  fidelity 

98.46% 

98.40% 

63.87% 

63.19% 

Banknote  Authentication  Data  Set  (5.1%  fidelity) 

99.09% 

98.75% 

95.43% 

97.76% 

Banknote  Authentication  Data  Set  (3.6%  fidelity) 

98.83% 

98.29% 

93.48% 

96.10% 

two  moons  (5%  fidelity) 

97.10% 

97.07% 

98.41% 

98.31% 

two  moons  (2.5%  fidelity) 

97.05% 

96.78% 

97.53% 

98.15% 

Fig.  3  Rod  1  and  Rod  2 

Our  convex  methods  find  the  local  minimizer 
easily. 

•  Problems  occur  when  the  fidelity  is  not  cho¬ 
sen  randomly.  In  this  case,  even  if  the  initial¬ 
ization  is  random,  the  convergence  might  not 
occur  for  the  binary  MBO  and  GL  methods. 
In  all  our  experiments  with  the  rods  data  sets 
and  MNIST,  the  local  minimizer  was  not  found 
by  the  two  methods.  However,  our  convex  algo¬ 
rithms  still  found  the  correct  local  minimizer. 

These  conclusions  are  supported  by  the  work 
done  on  the  MNIST  digits  data  set,  using  digits  4 
and  9  only.  The  second  vs.  third  eigenvector  of  the 
symmetric  graph  Laplacian  are  graphed  in  Figure 
4(a),  with  one  digit  represented  by  blue  and  an¬ 
other  by  red.  The  results  of  experiments  on  this 
data  set  are  found  in  Figure  4.  Each  row  repre¬ 
sents  a  different  experiment:  first  two  rows  con¬ 
tain  experiments  with  random  initialization,  while 
last  two  rows  contain  experiments  with  fidelity  in 
a  constrained  area.  The  initialization  is  random 
for  experiments  in  first  and  third  row,  and  is  con¬ 
structed  by  thresholding  the  second  eigenvector  of 
the  Laplacian  for  the  results  in  the  second  and 
fourth  row.  The  first  column  represents  the  initial¬ 
ization,  while  the  second  and  third  columns  are 
results  for  the  max-flow  and  binary  MBO  algo¬ 


rithm,  respectively.  The  fidelity  points  are  marked 
by  yellow  and  magenta  for  the  two  classes. 

We  see  that  if  the  fidelity  region  is  well  rep¬ 
resented  in  the  data  set,  no  matter  what  initial¬ 
ization,  none  of  the  algorithms  have  a  problem 
finding  a  close  to  perfect  solution  (accuracy  is  be¬ 
tween  98  and  99  percent-  see  Table  1).  However, 
when  the  fidelity  region  is  not  random  (in  this  case 
constrained  to  only  the  nodes  whose  correspond¬ 
ing  entry  in  the  second  or  third  eigenvector  of  the 
Laplacian  match  a  certain  range),  we  see  that  the 
binary  MBO  and  GL  algorithms  fail  to  obtain  an 
accurate  solution;  its  accuracy  is  below  70  percent. 
However,  the  max-flow  algorithm  and  the  primal 
augmented  Lagrangian  methods  achieve  an  almost 
perfect  solution  of  98.47%  and  98.40%  accuracy, 
respectively. 

The  two  conclusions  are  also  supported  by 
the  work  done  on  the  two  rod  data  sets  created 
from  images  displayed  in  Figure  3.  Experiments 
with  the  first  and  second  rod  image  are  shown  in 
Figures  5  and  6,  respectively.  The  first  two  rows 
represent  cases  with  a  random  fidelity  set,  while 
the  last  two  rows  are  experiments  with  corner  fi¬ 
delity.  Cases  with  random  initialization  are  in  first 
and  third  rows,  and  second  eigenvector  initializa¬ 
tion  is  used  in  experiments  in  the  second  and  fourth 
row.  The  first  column  is  the  initialization,  second 
column  is  the  max-flow  algorithm  result,  and  third 
column  is  the  binary  MBO  result. 

For  rod  1,  we  found  that  when  the  fidelity  is 
chosen  randomly,  the  minimizer  divides  the  bot¬ 
tom  two  rods  from  the  rest  of  the  image.  When 
the  fidelity  is  chosen  at  the  corners,  the  minimizer 
is  shown  in  the  bottom  two  rows  of  the  second  col¬ 
umn.  The  convex  max-flow  and  primal  augmented 
Lagrangian  algorithms  are  able  to  attain  these  min- 
imizers,  while  the  binary  MBO  and  GL  algorithms 
struggle  in  the  case  of  non-random  fidelity.  The  sit¬ 
uation  is  similar  with  rod  2. 
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(b)  Random  initialization  and  fidelity 
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(c)  max-fiow  result 
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(d)  binary  MBO  result 
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Second  Eigenvector 
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(i)  max-fiow  result 


-0.01  0  0.01  0.02 


Second  Eigenvector 

(j)  binary  MBO  result 


Second  Eigenvector 

(k)  2nd  eigenvector  initialization,  cor¬ 
ner  fidelity 


Second  Eigenvector 

(1)  max-flow  result 


-0.01  0  0.01  0.02 


Second  Eigenvector 

(m)  binary  MBO  result 


Fig.  4  MNIST.  Left:  initialization,  supervised  points  are  marked  in  yellow  and  magenta.  Middle:  max-flow  algorithm 
result.  Right:  binary  MBO  result 
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(d)  2nd  eigenvector  initialization 


(e)  max-flow  result 


(f)  binary  MBO  result 


(g)  Random  initial.,  corner  fidelity 


(h)  max-flow  result 


(i)  binary  MBO  result 


(j)  2nd  eigenvector  initialization  (k)  max- flow  result  (1)  binary  MBO  result 

Fig.  5  Results  for  Rod  1.  Left:  initialization,  supervised  points  are  marked  in  yellow  and  green.  Middle:  max-ffow 
algorithm  result.  Right:  binary  MBO  result 


Note  about  MBO  algorithm 

As  noted  in  [40],  the  first  step  of  the  MBO  al¬ 
gorithm  (heat  equation  with  an  extra  term)  was 
solved  using  the  eigenvalue  and  eigenvector  decom¬ 
position  of  the  Laplacian.  However,  a  disadvantage 
of  solving  equations  using  this  method  is  that,  for 
it  to  be  efficient,  only  a  fraction  of  eigenvectors  are 
used,  and  they  might  not  contain  enough  informa¬ 
tion  to  result  in  an  accurate  classihcation.  Nat¬ 


urally,  the  more  eigenvectors  one  computes,  the 
longer  the  process  will  take. 

As  an  alternative  way  of  solving  the  first  step 
(20)  of  the  MBO  algorithm,  we  have  tried  using 
just  the  simple  forward  heat  equation  solver.  How¬ 
ever,  this  did  not  result  in  an  accurate  segmenta¬ 
tion  in  the  case  of  non-random  hdelity,  thus  not  im¬ 
proving  the  results  from  the  original  way  of  solving 
it.  This  shows  that  the  algorithm  is  getting  stuck  in 
a  local  minimum,  since  the  problem  is  clearly  not 
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(a)  Random  initialization,  random  fi-  (b)  max-flow  result 

delity 


(c)  binary  MBO  result 


(d)  2nd  eigenvector  initialization 


(e)  max-flow  result 


(f)  binary  MBO  result 


(g)  Random  initialization,  corner  fi-  (h)  max-flow  result  (i)  binary  MBO  result 

delity 


(j)  2nd  eigenvector  initialization 


(k)  max-flow  result 


(1)  binary  MBO  result 


Fig.  6  Results  for  Rod  2.  Left:  Initialization,  supervised  points  are  marked  in  yellow  and  green.  Middle:  max-flow 
algorithm  result.  Right:  binary  MBO  result 


the  lack  of  information  encoded  within  the  small 
number  of  eigenvectors  used. 


4.6  Comparison  of  convergence,  speed,  and  energy 

The  stopping  criterion  used  for  all  algorithms 
was  taken  to  be  the  point  at  which  the  square  of 
the  relative  L2  norm  between  the  current  and  pre¬ 
vious  iterate  is  negligible,  or  below  a  certain  con¬ 


stant  a.  With  the  exception  of  the  MNIST  data  set 
(where  a  =  5*le  — 10),  the  max-flow,  binary  MBO 
and  GL  algorithms  stabilize  around  a  =  le— 17  or 
le— 16.  The  primal  augmented  Lagrangian  method 
stabilizes  around  a  =  le  —  08  or  le  —  09. 

Table  2  includes  information  about  the  number 
of  iterations  needed  to  reach  stability,  and  also  the 
timing  results  for  each  data  set. 

We  have  also  computed  the  initial  and  final  en¬ 
ergy  for  each  data  set.  The  energy  was  calculated 
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Table  2  Number  of  Iterations  and  Timing 


Number  of  iterations 

max-flow 

primal  augmented 
Lagrangian 

binary  MBO 

binary  GL 

MNIST 

426 

2709 

10 

52 

Banknote  Authentication  Data  Set 

314 

725 

7 

449 

two  moons 

1031 

451 

8 

108 

Timing  (s) 

max-flow 

primal  augmented 
Lagrangian 

binary  MBO 

binary  GL 

MNIST  “ 

2.88 

43.21 

0.52 

0.78 

Banknote  Authentication  Data  Set 

1.21 

3.76 

0.90 

0.95 

two  moons 

4.13 

5.23 

2.30 

2.98 

“  This  is  the  timing  of  the  method  using  already  computed  weights  and  eigenvalues/eigenvectors  of 
the  random  walk  Laplacian. 


Table  3  Comparison  of  Final  Energy 


Data  Set 

initial  energy 

max-flow 

final  energy 

primal  augmented 
Lagrangian 
final  energy 

binary  MBO 
final  energy 

binary  GL 
final  energy 

MNIST  (random  fid) 

23223 

789 

789 

798 

804 

MNIST  (non-random  fid) 

23223 

791 

792 

2167 

5363 

Banknote  Authentication 

3308 

30 

37 

51 

42 

two  moons 

3802 

533 

535 

538 

548 

rod  1  (random  fid) 

4159 

146 

148 

163 

159 

rod  1  (non-random  fid) 

4159 

88 

89 

825 

391 

rod  2  (random  fid) 

4528 

171 

176 

186 

184 

rod  2  (non-random  fid) 

4528 

101 

105 

709 

421 

using 

=  \  X!  (81) 

x,y£V 

where  A(a;)  is  0  if  node  x  was  classified  to  be  in 
the  first  class,  and  1  if  it  was  classified  to  be  in 
the  second  class.  Note  that  the  energy  here  is  ex¬ 
actly  TVw(X).  Table  3  includes  information  about 
the  initial  and  final  energy  for  each  method.  We 
see  that  the  max-flow  algorithm  is  able  to  obtain 
the  lowest  energy  in  each  case.  In  general,  one  can 
see  that  the  convex  algorithms  are  able  to  obtain 
the  global  minimizer  in  all  cases,  while  the  binary 
MBO  and  GL  algorithms  struggle  in  the  case  of 
non-random  fidelity. 

5  Balancing  constraints  on  the  mecx-flow 
method  using  graphs 

This  section  demonstrates  how  to  incorporate 
balancing  constraints  of  the  form  (18),  which  take 
into  account  the  size  of  the  classes.  Volume  con¬ 
straints  have  been  proposed  for  image  segmenta¬ 
tion  models  in  a  convex  framework  in  [35,43].  We 


propose  an  efficient  algorithm  for  incorporating  the 
hard  volume  constraint  (18)  on  graphs  by  slightly 
modifying  the  dual  max-flow  problem  using  a  new 
variable  p  :  V  ^  R  as  follows: 

max  \p{ps,Pt,p)  =  V  {Ps{x)  -ap)  }•  (82) 

Ps,Pt,P,P  I  J 

x£V 

subject  to 


\p{x,y)\  <  w{x,y), 

{x,y)  G 

E; 

(83) 

Ps{x)  <  Cs{x), 

Va;  G  V; 

(84) 

Pt{x)  <  Ct{x), 

Vx  G  V; 

(85) 

-Ps{x)  +Ptix)  -bp  =  0, 

Vx  G  V; 

(86) 

p  is  a  constant  function. 

(87) 

Introducing  a  Lagrange  multiplier  A  for  the 
constraint  (86)  yields  the  primal-dual  model 

min  max  \  E{ps,pt,p,  p\  X)  =  y^  {ps{x)  -  ap) 

X  Ps,Pt,P  I  ^ ^ 

x^V 

+  ^  A(a;)  ( divu,  p  -  Ps  +  Pt  +  p)'^  (88) 
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subject  to 


•  Optimize  p  flow 


\p{x,y)\  <  w{x,y), 

(x,y)  G  E] 

(89) 

Pa{x)  <  Ca{x), 

Vx  G  V; 

(90) 

Pt{x)  <  Ct{x), 

Vx  G  V; 

(91) 

p  is  a  constant  function. 

(92) 

Rearranging  the  terms,  we  obtain 
min  max  \  E{ps,pt,p,  p;  X)  = 

A  Pa,Pt,P,P  t 

{{l-  X)ps  +  Xpt  +  p{X  -a)  +  X  div^  p}  | 

xev 

(93) 

subject  to  (89)  -  (92). 

The  intuition  of  having  the  above  model  lies  in 
the  following.  We  observe  that  if  A  ^  B',  the  en¬ 
ergy  can  be  arbitrarily  large  by  choosing  ps  or  pt 
arbitrarily  small,  contradicting  boundedness  from 
above.  From  the  second  to  last  term,  it  follows  that 
if  the  balancing  constraint  (18)  is  not  satisfied,  the 
energy  can  be  made  arbitrarily  large  by  chosing  p 
arbitrarily  high  or  low.  Therefore,  by  maximizing 
for  ps,pt,p  and  p,  we  obtain  the  closed  form  ex¬ 
pression  (28)  subject  to  the  constraints  (29)  and 
the  balancing  constraint  (18). 

In  contrast  to  the  case  with  the  model  without 
balancing  constraints,  it  cannot  be  guaranteed  in 
advance  that  a  global  minimizer  is  obtained  by  the 
thresholding  procedure  described  in  Theorem  3.  If 
the  solution  is  binary,  it  must  also  be  a  global  min¬ 
imizer  of  the  binary  constrained  problem,  since  the 
convex  set  B'  contains  the  binary  set  B.  In  the  ex¬ 
periments,  the  solution  tends  to  be  binary  or  very 
close  to  binary,  indicating  that  a  global  minimizer, 
or  close  approximation,  can  be  obtained. 

We  again  construct  the  augmented  Lagrangian 
functional 

LciPs,Pt,P,X)  =  -ap+Ps+X{div^p-ps+pt+p) 
x&V 

-  I  l|divu.p-Ps +Pt +  p|iy  (94) 

which  is  exactly  (55)  if  p  is  zero. 

We  have  the  following  primal  augmented  La¬ 
grangian  algorithm  for  minimizing  the  above  func¬ 
tional,  where  we  alternate  between  maximizing  Lc 
for  the  dual  variables  and  updating  the  Lagrange 
multiplier  A: 


Algorithm  3  Balancing  Constraints 


Initialize  p\,  p],  p^  and  A^.  For  k  =  1, ...  until 
convergence: 


=  argmax  -  ^  \\div^u  P  -  F’^\\1  , 
||p(e)||<  W(e)  VeGiS  ^ 

(95) 

where  =  p^^  —  p^^  +  XL  —  pk  jg  fixed. 

•  Optimize  source  flow  ps 

Pa  =  argmax  Ps -7;  \\Ps  -  G  L  ^ 

Pa{x)<  Ce{x)\/x^V  ^ 

(96) 

where  =  Pt^  +  divu,  ^  is  fixed. 

•  Optimize  sink  flow  pt 

EJ-I  C  II  ^||2 

Pt  ■=  argmax  II2’ 

Pt{x)<  Ctix)  \/xev  ^ 

(97) 

where  =  Pa^^^  —  div^jP^'*'^  +  T"  “ 
fixed. 

•  Optimize  p 

p'=+i=argmax  (98) 

P  xav  ^ 

where  =  —Pt^^  —  divu,p^+^  +  V  +  Ps^^ 
fixed. 

•  Update  A 

Afc+i  =  -  c  (divu,  p^+^  -  -k  -k  p’"^^) . 


The  optimization  problem  (95)  for  p  can  be 
solved  by  one  step  of  the  projected  gradient  method 
as  follows: 

=  nw{p  +  cV^(div^p'=  -  F’^)),  (99) 

where  Uw  is  the  projection  defined  in  (60). 

The  subproblems  (96)  and  (97)  can  be  solved 


by 

Pa(x)  =  mm(G''(x)  +  -,Ca(x)); 

c 

(100) 

Pt(x)  =  mhi(H^(x),  Ct(x)). 

(101) 

We  solve  (98)  using 

/+!  =  mean(-ps'=+^  +p/'+^  +  div^p'=+^  -k  p’" 


In  step  (102),  the  constraint  that  p  should  be 
constant  is  imposed  exactly  by  computing  the  av¬ 
erage  of  the  pointwise  unconstrained  maximizers 
p{x)  for  X  G  V,  and  then  the  average  value  is  as¬ 
signed  to  p{x)  y  X  gV. 

Just  like  in  the  previous  cases,  the  final  classi¬ 
fication  is  obtained  by  thresholding  A. 
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5.1  Results  of  the  balancing  constraints  method 

We  tested  our  balancing  constraints  method  on 
several  data  sets  using  random  initialization  and 
fidelity.  It  handles  the  case  of  a  small  fidelity  re¬ 
gion  better  than  the  original  max-flow  method  (Al¬ 
gorithm  1)  and  gives  higher  accuracy  everywhere. 
The  results  are  displayed  in  Table  4.  In  general, 
the  solution  is  very  close  to  binary,  with  some 
small  differences  that  may  be  explained  by  the  fi¬ 
nite  stopping  criterion  of  the  algorithm,  indicat¬ 
ing  that  close  approximations  to  global  minimiz- 
ers  are  obtained.  To  measure  how  close  the  solu¬ 
tion  A  is  to  binary,  we  have  computed  the  norm 
where  is  defined  in  (30)  and 
t  =  0.5.  The  norm  should  be  0  if  A  is  binary.  In 
the  experiments,  the  values  of  the  norm  range  from 
0.0005  to  6*  10-1®. 

5.1.1  Two  Moons 

For  the  two  moons  example,  starting  from  20  fi¬ 
delity  points,  we  obtained  very  reasonable  results. 
For  50,  40,  30  and  20  fidelity  points,  we  obtained 
97.19%,  97.12%,  97.11%  and  96.11%  average  accu¬ 
racy  (over  100  different  fidelity  sets),  respectively. 
While  the  results  of  the  binary  MBO  method  (with¬ 
out  any  zero  means  constraint)  for  the  two  moons 
data  set  achieves  slightly  better  accuracy  for  50 
and  100  fidelity  points  (being  of  97.53%  and  98.41%, 
respectively),  we  noticed  that  if  the  number  of 
fidelity  points  is  too  low,  the  method  is  unable 
to  perform  well,  as  the  results  vary  not  insignif¬ 
icantly  depending  on  the  fidelity  set.  The  same  is 
the  case  with  the  max-flow  method  with  no  balanc¬ 
ing  constraint.  For  example,  for  20  fidelity  points, 
the  average  accuracy  accuracy  we  obtained  for  the 
method  was  88.22%.  However,  with  the  balanced 
method,  we  still  obtain  a  good  result  (96.11%)  for  a 
fidelity  set  containing  as  little  as  20  points.  Thus, 
the  advantage  of  the  method  is  that  it  performs 
well  with  even  a  very  small  fidelity  region.  The  re¬ 
sults  are  summarized  in  Table  4. 

5.1.2  MNIST 

For  the  MNIST  data  set,  we  obtained  very  good 
results  even  for  a  small  number  of  fidelity  points. 
For  500,  400,  300  and  210  fidelity  points  (or  3.6%, 
2.9%,  2.2%  and  1.5%  of  the  data,  respectively), 
we  obtained  an  average  accuracy  (over  100  differ¬ 
ent  fidelity  sets)  of  98.59%,  98.48%,  98.45%  and 
98.41%,  respectively.  A  comparison  to  the  results 
of  the  regular  max-flow  method  is  in  Table  4.  Note 
that  in  addition  to  giving  at  least  a  slightly  higher 
accuracy  everywhere,  it  handles  the  case  of  a  small 


number  of  fidelity  points  better  than  the  original 
method.  For  example,  for  210  fidelity  points,  the 
method  obtained  an  accuracy  of  98.41%,  while  the 
regular  max-flow  method  achieved  a  much  lower 
accuracy  of  93.68%. 

5.1.3  Banknote  Authentication 

For  the  banknote  authentication  data  set  from 
the  UCI  Machine  Learning  Repository  [2],  we  ob¬ 
tained  reasonable  results  for  as  little  as  20  fidelity 
points.  The  results  are  shown  in  Table  4.  The  bal¬ 
ancing  constraints  method  obtains  better  results 
than  the  original  max-flow  method,  achieving  an 
average  accuracy  (over  100  different  fidelity  sets) 
of  98.55%  for  only  20  fidelity  points,  as  opposed 
to  the  accuracy  of  96.78%  of  the  original  max-flow 
method. 

5.1.4  Rods 

For  the  first  rod  data  set,  we  obtained  reason¬ 
able  results  starting  from  around  10  fidelity  points 
out  of  around  two  thousand  that  are  in  the  rods 
data  set.  For  10  to  20  fidelity  points,  the  accu¬ 
racy  was  around  around  96%.  Testing  50  and  100 
fidelity  points,  we  obtained  around  99%  accuracy. 

6  Conclusion 

We  have  described  two  convex  methods  for  data 
segmentation  using  a  graphical  framework.  The 
first  solves  a  dual  maximum  flow  problem  by  con¬ 
tinuous  optimization  techniques,  and  the  second 
method  solves  the  primal  problem  directly.  It  was 
proved  the  algorithms  were  guaranteed  to  produce 
global  minimizers  for  semi-supervised  data  segmen¬ 
tation  problems  with  two  classes.  In  case  where  the 
class  sizes  are  known  precisely  or  approximately, 
the  first  model  could  be  slightly  modified  to  pro¬ 
duce  more  stable  and  accurate  results  by  incorpo¬ 
rating  constraints  on  the  class  sizes.  Simulations 
showed  that  the  methods  were  comparable  with 
or  outperformed  the  state-of-the-art  algorithms.  In 
fact,  our  convex  models  had  the  advantage  over 
non-convex  methods  in  that  the  latter  could  oc¬ 
casionally  get  stuck  in  local  minima.  Moreover,  a 
thorough  comparison  to  a  non-convex  binary  MBO 
and  GL  method  [40]  revealed  that  the  latter  may 
not  produce  an  accurate  result  in  case  when  the 
fidelity  region  is  not  chosen  randomly,  but  that 
did  not  affect  the  proposed  convex  methods.  To 
speed  up  the  timing  of  the  algorithms,  we  made 
use  of  a  fast  numerical  solver  described  in  [1]  to 
solve  some  of  the  subproblems  involving  graph- 
based  PDFs.  Future  work  includes  an  extension 
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Table  4  Comparison  of  balancing  constraints  max-flow  method  and  regular  max-flow  method 


Number  of  fidelity  points 

50 

40 

30 

20 

Two  moons-  Max-flow  method  (regular) 

97.05% 

96.92% 

96.86% 

88.22% 

Two  moons-  Max-flow  method  (balancing  constraints) 

97.19% 

97.12% 

97.11% 

96.11% 

Number  of  fidelity  points 

500 

400 

300 

210 

MNIST-  Max-flow  method  (regular) 

98.44% 

98.40% 

98.36% 

93.68% 

MNIST-  Max-flow  method  (balancing  constraints) 

98.59% 

98.48% 

98.45% 

98.41% 

Number  of  fidelity  points 

50 

40 

30 

20 

Banknote  Authentication  Data  Set  (regular) 

98.83% 

98.72% 

98.21% 

96.78% 

Banknote  Authentication  Data  Set  (balancing  constraints) 

98.83% 

98.91% 

98.72% 

98.55% 

to  multiple  classes  and  experimentation  with  other 

ways  to  incorporate  knowledge  about  class  sizes. 

References 

1.  Anderson,  C.:  A  Rayleigh-Chebyshev  procedure 
for  finding  the  smallest  eigenvalues  and  associated 
eigenvectors  of  large  sparse  Hermitian  matrices.  J. 
Comput.  Phys.  229,  7477-7487  (2010) 

2.  Bache,  K.,  Lichman,  M.:  UCI  ma¬ 
chine  learning  repository  (2013).  URL 

http:  / /archive. ics.uci.edu/ml 

3.  Belkin,  M.,  Niyogi,  P.,  Sindhwani,  V.:  Manifold 
regularization:  A  geometric  framework  for  learning 
from  labeled  and  unlabeled  examples.  J.  Mach. 
Learn.  Res.  7,  2399-2434  (2006) 

4.  Bertozzi,  A.,  Flenner,  A.:  Diffuse  interface  mod¬ 
els  on  graphs  for  classification  of  high  dimensional 
data.  Multiscale  Model.  Simul.  10(3),  1090-1118 
(2012) 

5.  Boykov,  Y.,  Kolmogorov,  V.:  An  experimental  com¬ 
parison  of  min-cut/max-flow  algorithms  for  energy 
minimization  in  vision.  IEEE  Trans.  Pattern  Anal. 
Mach.  Intell.  26,  359-374  (2001) 

6.  Boykov,  Y.,  Veksler,  O.,  Zabih,  R.:  Fast  approxi¬ 
mate  energy  minimization  via  graph  cuts.  IEEE 
Trans.  Pattern  Anal.  Mach.  Intell.  23(11),  1222  - 
1239  (2001) 

7.  Bresson,  X.,  Esedoglu,  S.,  Vandergheynst,  P.,  Thi- 
ran,  J.,  Osher,  S.:  Fast  global  minimization  of  the 
active  contour/snake  model.  J.  Math  Imaging  Vis. 
28(2),  151-167  (2007) 

8.  Bresson,  X.,  Laurent,  T.,  Uminsky,  D.,  von 
Brecht,  J.H.:  Convergence  and  energy  landscape  for 
Cheeger  cut  clustering.  Adv.  Neural  Inf.  Process. 
Syst.  25,  1394-1402  (2012) 

9.  Bresson,  X.,  Tai,  X.C.,  Chan,  T.F.,  Szlam,  A.: 
Multi-class  transductlve  learning  based  on  ;  1  re¬ 
laxations  of  cheeger  cut  and  mumford-shah-potts 
model.  Journal  of  Mathematical  Imaging  and  Vi¬ 
sion  49(1),  191-201  (2014) 

10.  Buades,  A.,  Coll,  B.,  Morel,  J.M.:  A  review  of  image 
denoising  algorithms,  with  a  new  one.  Multiscale 
Model.  Simul.  4(2),  490-530  (2005) 

11.  Chan,  T.F.,  Esedoglu,  S.,  Nikolova,  M.:  Algorithms 
for  finding  global  mlnimlzers  of  image  segmentation 
and  denoising  models.  SIAM  J.  Appl.  Math.  66(5), 
1632-1648  (2006) 

12.  Chapelle,  O.,  Scholkopf,  B.,  Zien,  A.:  Semi- 
Supervised  Learning,  vol.  2.  MIT  Press  (2006) 

13.  Cheeger,  J.:  A  lower  bound  for  the  smallest  eigen¬ 
value  of  the  Laplaclan.  Princeton  University  Press 
(1970) 


14.  Chung,  F.:  Spectral  Graph  Theory,  vol.  92.  Ameri¬ 
can  Mathematical  Society  (1997) 

15.  Cire§an,  D.,  Meier,  U.,  Masci,  J.,  Gambardella,  L., 
Schmidhuber,  J.:  Flexible,  high  performance  convo¬ 
lutional  neural  networks  for  Image  classification.  In: 
Proceedings  of  the  22nd  International  Joint  Confer¬ 
ence  on  Artificial  Intelligence,  pp.  1237-1242  (2011) 

16.  Couprie,  C.,  Grady,  L.,  Najman,  L.,  Talbot,  H.: 
Power  watershed:  A  unifying  graph-based  optimiza¬ 
tion  framework.  IEEE  Trans.  Pattern  Anal.  Mach. 
Intell.  33(7),  1384-1399  (2011) 

17.  Couprie,  C.,  Grady,  L.,  Talbot,  H.,  Najman,  L.: 
Combinatorial  continuous  maximum  flow.  SIAM  J. 
Imaging  Sci.  4(3),  905-930  (2011) 

18.  Decoste,  D.,  Scholkopf,  B.:  Training  invariant  sup¬ 
port  vector  machines.  Mach.  Learn.  46(1),  161-190 
(2002) 

19.  Elmoataz,  A.,  Lezoray,  O.,  Bougleux,  S.:  Nonlocal 
discrete  regularization  on  weighted  graphs:  a  frame¬ 
work  for  image  and  manifold  processing.  IEEE 
Trans.  Image  Process.  17(7),  1047-1060  (2008) 

20.  Ford,  L.R.,  Fulkerson,  D.R.:  Flows  In  Networks. 
Princeton  University  Press  (1962) 

21.  van  Gennip,  Y.,  Bertozzi,  A.:  Gamma-convergence 
of  graph  Ginzburg-Landau  functionals.  Advanced  in 
Differential  Equations  17(11-12),  1115-1180  (2012) 

22.  van  Gennip,  Y.,  Guillen,  N.,  Ostlng,  B.,  Bertozzi, 
A.L.:  Mean  curvature,  threshold  dynamics,  and 
phase  field  theory  on  finite  graphs.  Milan  J.  Math 
(2014) 

23.  Gilboa,  G.,  Osher,  S.:  Nonlocal  operators  with  ap¬ 
plications  to  image  processing.  Multiscale  Model. 
Simul.  7(3),  1005-1028  (2008) 

24.  Goldberg,  A.V.,  Tarjan,  R.E.:  A  new  approach  to 
the  maximum-flow  problem.  J.  ACM  35(4),  921- 
940  (1988) 

25.  Goldstein,  T.,  Bresson,  X.,  Osher,  S.:  Geometric  ap¬ 
plications  of  the  split  bregman  method:  Segmenta¬ 
tion  and  surface  reconstruction.  J.  Sci.  Gomput. 
45(1),  271-293  (2010) 

26.  Goldstein,  T.,  Osher,  S.:  The  split  bregman  method 
for  11-regularized  problems.  SIAM  J.  Imaging  Sci. 
2(2),  323-343  (2009) 

27.  Grady,  L.:  Multilabel  random  walker  image  segmen¬ 
tation  using  prior  models.  In:  Proceedings  of  the 
2005  IEEE  Conference  on  Computer  Vision  and 
Pattern  Recognition  (CVPR),  pp.  763-770  (2005) 

28.  Grady,  L.:  Random  walks  for  image  segmentation. 
IEEE  Trans.  Pattern  Anal.  Mach.  Intell.  28(11), 
1768-1783  (2006) 

29.  Grady,  L.,  Polimenl,  J.R.:  Discrete  Calculus:  Ap¬ 
plied  Analysis  on  Graphs  for  Computational  Sci¬ 
ence.  Springer  (2010) 


Global  binary  optimization  on  graphs  for  classification  of  high  dimensional  data 


21 


30.  Grady,  L.,  Schiwietz,  T.,  Aharon,  S.,  Westermann, 
R.:  Random  walks  for  interactive  alpha-matting.  In: 
Proceedings  of  VHP,  pp.  423-429  (2005) 

31.  Hein,  M.,  Audibert,  J.,  Von  Luxburg,  U.:  From 
graphs  to  manifolds  -  weak  and  strong  pointwise 
consistency  of  graph  laplacians.  In:  Proceedings  of 
the  18th  Gonference  on  Learning  Theory  (COLT, 
pp.  470-485.  Springer  (2005) 

32.  Hein,  M.,  Biihler,  T.:  An  inverse  power  method 
for  nonlinear  eigenproblems  with  applications  in  1- 
spectral  clustering  and  sparse  PCA.  Adv.  Neural 
Inf.  Process.  Syst.  23,  847-855  (2010) 

33.  Hu,  H.,  Laurent,  T.,  Porter,  M.A.,  Bertozzi,  A.L.:  A 
method  based  on  total  variation  for  network  modu¬ 
larity  optimization  using  the  MBO  scheme.  SIAM 
J.  Appl.  Math.  73(6),  2224-2246  (2013) 

34.  Kegl,  B.,  Busa-Fekete,  R.:  Boosting  products  of 
base  classifiers.  In:  Proceedings  of  the  26th  Annual 
International  Conference  on  Machine  Learning,  pp. 
497-504  (2009) 

35.  Klodt,  M.,  Cremers,  D.:  A  convex  framework  for 
image  segmentation  with  moment  constraints.  In: 
IEEE  International  Conference  on  Computer  Vision 
(ICCV),  pp.  2236  -  2243  (2011) 

36.  Kolmogorov,  V.,  Zabih,  R.:  What  energy  functions 
can  be  minimized  via  graph  cuts.  IEEE  Trans.  Pat¬ 
tern  Anal.  Mach.  Intell.  26,  65-81  (2004) 

37.  LeCun,  Y.,  Bottou,  L.,  Bengio,  Y.,  Haffner,  P.: 
Gradient-based  learning  applied  to  document  recog¬ 
nition.  Proceedings  of  the  IEEE  86(11),  2278-2324 
(1998) 

38.  LeCun,  Y.,  Cortes,  C.:  The  MNIST 

database  of  handwritten  digits.  URL 

http:  / /yann. lecun.com/exdb/mnist/ 

39.  Levin,  A.,  Rav-Acha,  A.,  Lischinski,  D.:  Spectral 
matting.  IEEE  Trans.  Pattern  Anal.  Mach.  Intell. 
30(10),  1699  -1712  (2008) 

40.  Merkurjev,  E.,  Kostic,  T.,  Bertozzi,  A.:  An  MBO 
scheme  on  graphs  for  classification  and  image  pro¬ 
cessing.  SIAM  J.  Imaging  Sci.  6(4),  1903-1930 
(2013) 

41.  Merriman,  B.,  Bence,  J.K.,  Osher,  S.:  Diffusion 
generated  motion  by  mean  curvature.  AMS  Se¬ 
lected  Lectures  in  Mathematics  Series:  Compu¬ 
tational  Crystal  Growers  Workshop  89  66,  73-83 
(1992) 

42.  Mohar,  B.:  The  Laplacian  spectrum  of  graphs. 
Graph  Theory,  Combin.  Applicat.  2,  871-898 
(1991) 

43.  Mollenhoff,  T.,  Nieuwenhuis,  C.,  Toppe,  E.,  Cre¬ 
mers,  D.:  Efficient  convex  optimization  for  mini¬ 
mal  partition  problems  with  volume  constraints.  In: 
Energy  Minimization  Methods  in  Computer  Vision 
and  Pattern  Recognition,  pp.  94-107  (2013) 

44.  Perona,  P.,  Zelnik-Manor,  L.:  Self-tuning  spectral 
clustering.  Adv.  Neural  Inf.  Process.  Syst.  17,  1601- 
1608  (2004) 

45.  Setzer,  S.:  Operator  splittings,  bregman  methods 
and  frame  shrinkage  in  image  processing.  Interna¬ 
tional  Journal  of  Computer  Vision  92(3),  265-280 
(2011) 

46.  Shi,  J.,  Malik,  J.:  Normalized  cuts  and  image  seg¬ 
mentation.  IEEE  Trans.  Pattern  Anal.  Mach.  Intell. 
22(8),  888-905  (2000) 

47.  Szlam,  A.,  Bresson,  X.:  A  total  variation-based 
graph  clustering  algorithm  for  Cheeger  ratio  cuts. 
In:  Proceedings  of  the  27th  International  Confer¬ 
ence  on  Machine  Learning,  pp.  1039-1046  (2010) 

48.  Szlam,  A.D.,  Maggioni,  M.,  Coifman,  R.R.:  Regu¬ 
larization  on  graphs  with  function-adapted  diffusion 
processes.  J.  Mach.  Learn.  Res.  9,  1711-1739  (2008) 


49.  Tai,  X.C.,  Christiansen,  O.,  Lin,  P.,  Skjaelaaen,  L: 
Image  segmentation  using  some  piecewise  constant 
level  set  methods  with  mbo  type  of  projection.  In¬ 
ternational  Journal  of  Computer  Vision  73(1),  61- 
76  (2007) 

50.  Wang,  J.,  Jebara,  T.,  Chang,  S.:  Graph  transduc¬ 
tion  via  alternating  minimization.  In:  Proceedings 
of  the  25th  International  Conference  on  Machine 
Learning,  pp.  1144-1151  (2008) 

51.  Wu,  C.,  Tai,  X.C.:  Augmented  lagrangian  method, 
dual  methods,  and  split  bregman  iteration  for  rof, 
vectorial  tv,  and  high  order  models.  SIAM  J.  Imag¬ 
ing  Sci.  3(3),  300-339  (2010) 

52.  Yuan,  J.,  Bae,  E.,  Tai,  X.C.:  A  study  on  contin¬ 
uous  max-flow  and  min-cut  approaches.  In:  2010 
IEEE  Conference  on  Computer  Vision  and  Pattern 
Recognition,  pp.  2217-2224  (2010) 

53.  Yuan,  J.,  Bae,  E.,  Tai,  X.C.,  Boykov,  Y.:  A  spa¬ 
tially  continuous  max-flow  and  min-cut  framework 
for  binary  labeling  problems.  Numer.  Math.  126(3), 
559-587  (2013) 

54.  Zhou,  D.,  Bousquet,  O.,  Lai,  T.N.,  Weston,  J., 
Scholkopf,  B.:  Learning  with  local  and  global  consis¬ 
tency.  Advances  in  Neural  Information  Processing 
Systems  16,  321-328  (2004) 

55.  Zhou,  D.,  Scholkopf,  B.:  A  regularization  framework 
for  learning  from  graph  data.  In:  Workshop  on  Sta¬ 
tistical  Relational  Learning.  International  Confer¬ 
ence  on  Machine  Learning  (2004) 

56.  Zhu,  X.:  Semi-supervised  learning  literature  survey. 
Computer  Sciences  Technical  Report  1530,  Univer¬ 
sity  of  Wisconsin-Madison  (2005) 


